AITopics

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.24)
North America > Canada > Ontario > Toronto (0.05)
North America > United States > New York (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Neural Information Processing SystemsFeb-9-2026, 06:33:46 GMT

Bridging Discrete and Backpropagation: Straight-Through and Beyond Liyuan Liu Chengyu Dong Xiaodong Liu Bin Y u Jianfeng Gao Microsoft Research

Backpropagation, the cornerstone of deep learning, is limited to computing gradients for continuous variables. This limitation poses challenges for problems involving discrete latent variables. To address this issue, we propose a novel approach to approximate the gradient of parameters involved in generating discrete latent variables. First, we examine the widely used Straight-Through (ST) heuristic and demonstrate that it works as a first-order approximation of the gradient. Guided by our findings, we propose ReinMax, which achieves second-order accuracy by integrating Heun's method, a second-order numerical method for solving ODEs. ReinMax does not require Hessian or other second-order derivatives, thus having negligible computation overheads. Extensive experimental results on various tasks demonstrate the superiority of ReinMax over the state of the art.

approximation, artificial intelligence, machine learning, (18 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.61)

Neural Information Processing SystemsFeb-7-2026, 15:55:10 GMT

196f5641aa9dc87067da4ff90fd81e7b-Supplemental.pdf

constraint, objective, simplex constraint, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

arXiv.org Artificial IntelligenceDec-4-2025

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Zheng, Chujie, Dang, Kai, Yu, Bowen, Li, Mingze, Jiang, Huiqiang, Lin, Junrong, Liu, Yuqiong, Lin, Hao, Wu, Chencan, Hu, Feng, Yang, An, Zhou, Jingren, Lin, Junyang

This paper proposes a novel formulation for reinforcement learning (RL) with large language models, explaining why and under what conditions the true sequence-level reward can be optimized via a surrogate token-level objective in policy gradient methods such as REINFORCE. Specifically, through a first-order approximation, we show that this surrogate becomes increasingly valid only when both the training-inference discrepancy and policy staleness are minimized. This insight provides a principled explanation for the crucial role of several widely adopted techniques in stabilizing RL training, including importance sampling correction, clipping, and particularly Routing Replay for Mixture-of-Experts (MoE) models. Through extensive experiments with a 30B MoE model totaling hundreds of thousands of GPU hours, we show that for on-policy training, the basic policy gradient algorithm with importance sampling correction achieves the highest training stability. When off-policy updates are introduced to accelerate convergence, combining clipping and Routing Replay becomes essential to mitigate the instability caused by policy staleness. Notably, once training is stabilized, prolonged optimization consistently yields comparable final performance regardless of cold-start initialization. We hope that the shared insights and the developed recipes for stable RL training will facilitate future research.

large language model, machine learning, reinforcement learning, (18 more...)

2512.01374

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

arXiv.org Artificial IntelligenceNov-5-2025

Linear-Time Demonstration Selection for In-Context Learning via Gradient Estimation

Zhang, Ziniu, Zhang, Zhenshuo, Li, Dongyue, Wang, Lu, Dy, Jennifer, Zhang, Hongyang R.

This paper introduces an algorithm to select demonstration examples for in-context learning of a query set. Given a set of $n$ examples, how can we quickly select $k$ out of $n$ to best serve as the conditioning for downstream inference? This problem has broad applications in prompt tuning and chain-of-thought reasoning. Since model weights remain fixed during in-context learning, previous work has sought to design methods based on the similarity of token embeddings. This work proposes a new approach based on gradients of the output taken in the input embedding space. Our approach estimates model outputs through a first-order approximation using the gradients. Then, we apply this estimation to multiple randomly sampled subsets. Finally, we aggregate the sampled subset outcomes to form an influence score for each demonstration, and select $k$ most relevant examples. This procedure only requires pre-computing model outputs and gradients once, resulting in a linear-time algorithm relative to model and training set sizes. Extensive experiments across various models and datasets validate the efficiency of our approach. We show that the gradient estimation procedure yields approximations of full inference with less than ${1}\%$ error across six datasets. This allows us to scale up subset selection that would otherwise run full inference by up to ${37.7}\times$ on models with up to $34$ billion parameters, and outperform existing selection methods based on input embeddings by ${11}\%$ on average.

demonstration, large language model, machine learning, (21 more...)

2508.19999

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Data Science > Data Mining (0.93)
(2 more...)

Kokhlikyan, Narine, Chaudhuri, Kamalika, Mahloujifar, Saeed

Z0-Inf: Zeroth Order Approximation for Data Influence

arXiv.org Artificial IntelligenceOct-15-2025

A critical aspect of analyzing and improving modern machine learning systems lies in understanding how individual training examples influence a model's predictive behavior. Estimating this influence enables critical applications, including data selection and model debugging; in particular, self-influence, which quantifies the influence of a training point on itself, has found many uses in data quality assessment and outlier detection. Existing methods for measuring data influence, however, are often impractical for large models due to low accuracy or prohibitive computational costs: most approaches either provide poor approximations or rely on gradients and inverse-Hessian computations that remain challenging to scale. In this work, we introduce a highly efficient zeroth-order approximation for estimating the influence of training data that requires only a fraction of the time and memory footprint of prior methods. Notably, our method relies solely on loss values of intermediate checkpoints on the training and test data, along with the checkpoints themselves, making it broadly applicable even when the loss function of interest is non-differentiable. Beyond its computational efficiency, our approach achieves superior accuracy in estimating self-influence and comparable or improved accuracy in estimating train-test influence for fine-tuned large language models, enabling scalable and practical analysis of how training data shapes model behavior.

approximation, large language model, machine learning, (22 more...)

2510.11832

Genre: Research Report > Experimental Study (0.46)

Industry: Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.57)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)

Neural Information Processing SystemsOct-8-2025, 08:01:20 GMT

28b5dfc51e5ae12d84fb7c6172a00df4-Paper-Conference.pdf

approximation, first-order approximation, reinmax, (15 more...)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceOct-8-2025

Fast Leave-One-Out Approximation from Fragment-Target Prevalence Vectors (molFTP) : From Dummy Masking to Key-LOO for Leakage-Free Feature Construction

Godin, Guillaume

A fundamental question for users of predictive models is: how good is the training data (1)? One way to approach this is by delineating the model's applicability domain. A second safeguard is to prevent data leakage (2), which motivates deduplication and proper validation protocols (3) . In practice, it is standard to use cross-validation or time-series split such as SIMPD (4) . Beyond sample-level leakage (molecules crossing folds), we must also consider feature leakage (when features inadvertently encode information about held-out molecules) (2) . We return to this point in the related Work section.

artificial intelligence, data mining, machine learning, (18 more...)